Title: mdmerge Specifications

[stoc]: #spectoc "Section Contents" class=backref

# mdmerge Specifications [&#x21d1;][rtoc] [spectoc]

* [unresolved issues][specissues]
* [filepaths][specpaths]
* [Interpreting index files][specidxs]
    - [LeanPub index files][]
    - [mmd_merge index files][]
* [Interpreting include declarations][specdecl]
    - [MultiMarkdown transclusion syntax][]
    - [Marked inclusion syntax][]
    - [LeanPub code inclusion syntax][]
* [Command Line Syntax][speccli]
    - [options][]

## unresolved issues [&#x21d1;][stoc] [specissues]

*Note: These are just issues I've thought about addressing; no active plans to fix these at the moment.*

* error handling should be configurable ... output html error page, output system exit code, fail silently, et cetera.
* handle missing files in CLI better
* need unit tests for absolute file paths
* need unit tests for home (`~`) file paths
* perhaps it should strip trailing line breaks? (i.e. so that included files only have one terminating `'\n'`)

## filepaths [&#x21d1;][stoc] [specpaths]

Unless otherwise specified, all file path are specified as:

* Absolute (starts with `/`)
* Home (starts with `~/`) in which the tilde means the users home directory
* Relative (starts with something other than `/` or `~/`) to the file that includes them.

## Interpreting index files [&#x21d1;][stoc] [specidxs]

### LeanPub index files

If the first line of the file is `frontmatter:` then the file is recognized as a LeanPub index file. If the filename is `book.txt` and the CLI option `--leanpub` was used, then the file is recognized as a Leanpub index file (whether or not it contains a `frontmatter:` line).

The processing rules for **LeanPub index files** are:

1. The remaining lines are treated as filepaths of files to be merged together.
1. Blank lines are ignored
1. Lines beginning with a `#` are ignored
1. Lines consisting of `frontmatter:`, `mainmatter:`, and `backmatter:` are ignored.
1. Lines that specify invalid file syntax or specify files that don't exist will be ignored, except that a warning message will be written to stderr.
1. Leading whitespace is ignored
1. A blank line is inserted between merged files

The filepaths in the index file can be absolute, relative to the index file, and can use the `~` syntax for the user's home directory.

### mmd_merge index files

If the first line of the file is `#merge` then the file is recognized a mmd_merge index file.

The processing rules for **mmd_merge index files** are:

1. The remaining lines are treated as filepaths of files to be merged together.
1. Blank lines are ignored
1. Lines beginning with a `#` are ignored
1. Lines consisting of `frontmatter:`, `mainmatter:`, and `backmatter:` are ignored.
1. Lines that specify invalid file syntax or specify files that don't exist will be ignored, except that a warning message will be written to stderr.
1. Leading whitespace is significant in that each tab or every 4 spaces is treated as an increase in header level (so a single tab indent would change convert an h1 to an h2; a double tab indent would convert an h1 to an h3).
1. A blank line is inserted between merged files
    
    From the insource documentation of the mmd_merge utility:

    > This script is designed to allow you to use different files to store parts of a larger MultiMarkdown document, making it easier to reorganize the document if you so desire. Each line consists of the url or filename of the next document. If the line is indented, each tab (or 4 spaces) will increase the header level of the document by 1 (similar to the `Base Header Level` metadata). Blank lines are ignored. Lines starting with `#` are treated as comments. Only metadata from the first file would be properly handles, since it would be the only metadata at the top of the new "virtual" document.

The filepaths in the index file can be absolute, relative to the index file, and can use the `~` syntax for the user's home directory.

## Interpreting include declarations [&#x21d1;][stoc] [specdecl]

### MultiMarkdown transclusion syntax

The transclusion syntax was introduced in MultiMarkdown 4.5. An include declaration is matched as a **MultiMarkdown file transclusion** if it has the following form:

1. A blank line, followed by 
1. A line containing only `{{filepath}}`, followed by 
1. Another blank line.

The filepath can be absolute, relative to the root file, and can use the `~` syntax for the user's home directory.

Files included in this manner will themselves be examined for further include declarations or as index files.

Another format matching MultiMarkdown file transclusion is:

1. A blank line, followed by 
1. A code fence start (e.g. `` ``` `` or `` ```python `` or similar, followed by
1. A line containing only `{{filepath}}`, followed by 
1. A code fence termination (e.g. `` ``` ``), followed by
1. Another blank line.

The filepath can be absolute, relative to the root file, and can use the `~` syntax for the user's home directory.

Files matched in this manner will not be examined for further include declarations or index file specification. They will be included as is, including trailing line breaks.

In any case the filename may be specified with a wildcard file extension (e.g., `name.*`). In this case MultiMarkdown will substitute an extension by the export format. This utility will substitute the wildcard in one of these ways:

* If the command line flag `--export-target html` is specified, replace `.*` with `.html`
* If the command line flag `--export-target latex` is specified, replace `.*` with `.tex`
* If the command line flag `--export-target lyx` is specified, replace `.*` with `.lyx`
* If the command line flag `--export-target opml` is specified, replace `.*` with `.opml`
* If the command line flag `--export-target rtf` is specified, replace `.*` with `.rtf`
* If the command line flag `--export-target odf` is specified, replace `.*` with `.odf`
* If the command line flag `--export-target` is unspecified, replace `.*` with the same extension as the current file (the file containing the include declaration)

### Marked inclusion syntax

A declaration is matched as a **Marked inclusion** if it has the form:

1. A blank line, followed by 
2. A line containing only `<<[filepath]`, followed by 
3. Another blank line.

The filepath can be absolute, relative to the root file, and can use the `~` syntax for the user's home directory.

### Marked raw file inclusion syntax

A declaration is matched as a **Marked raw file inclusion** if it has the form:

1. A blank line, followed by
2. A line containing only `<<{filepath}`, followed by
3. Another blank line.

The filepath can be absolute, relative to the root file, and can use the `~` syntax for the user's home directory.

Raw file processing is affected by the `--just-raw` CLI option. If not specified, then all raw file specifications lines are converted to HTML comment lines ... like `<!-- <<{filepath} -->`. However, if the `--just-raw` CLI option is specified, then **only** raw file specifications are process (including those inside HTML comments) and other kinds of include specifications are ignore.

The specified file is just inserted as is into the output text. Raw files are not processed for additional include statements.

### LeanPub code inclusion syntax

A declaration is matched as **LeanPub code inclusion** if it has the form:

1. A blank line, followed by 
2. A line containing only `<<(filepath)`, followed by 
3. Another blank line.

The filepath can be absolute, relative to the root file, and can use the `~` syntax for the user's home directory.

An declaration is also matched as LeanPub code inclusion if it has the form:

1. A blank line, followed by 
2. A line containing only `<<[code title](filepath)`, followed by 
3. Another blank line.

The filepath can be absolute, relative to the root file, and can use the `~` syntax for the user's home directory.

In this form, however, the text in `[code title]` is *not* used as a caption (although that is what LeanPub would do); it is ignored.

Note: Unlike Marked inclusion, wildcard file extension substitution is not performed with LeanPub code inclusion.

## Command Line Syntax [&#x21d1;][stoc] [speccli]

The command line looks like this:

```no-highlight
mdmerge [options] [-o outputfile] inputfiles
mdmerge [options] [-o outputfile] -
```

### options

`options`
:   One or more of `--book`, `--export-target`, `--ignore-transclusions`, `--just-raw`, `--leanpub`, `--version`, `--help`, `-h`, `-?`.

`--book`
:   Treat STDIN as an index file (a "book" file).

`--export-target [html|latex|lyx|opml|rtf|odf]`
:   Indicates the ultimate output target of the markdown processor, but primarily impacts wildcard substitution in Marked inclusion.

`--ignore-transclusions`
:   Leave any MultiMarkdown transclusion specifications alone; do not include
the specified file. Useful if you want to mix Marked/LeanPub includes and
MultiMarkdown includes, but have MultiMarkdown handline the transclusions.

`--just-raw`
:   Ignore all include specifications except for raw includes; useful for
processing the output of the Markdown processor to pick up the raw file include
specifications that should have passed through untouched.

`--leanpub`
:   Indicates that any input file named `book.txt` should be treated as a LeanPub index file.

`--version`
:   Gives the version information about the utility.

`-o outputfile`
:   The filepath in which to store the merged text. If not specified, then STDOUT is used.

`--outfile outputfile`
:   same as `-o`.

`inputfiles`
:   A list of space separated input files that can be merged together. If multiple files are given, they are treated as if they were specified in a LeanPub index file.

`-`
:   The input comes from STDIN.

`--help`
:   Help information

`-h`
:   Help information

`-?`
:   Help information
